CzarniakMichelSedar Github Repository

1 Research Rationale

The impetus behind this research lies in the intricate interplay between socioeconomic factors and power plants. As the global energy landscape undergoes a transformative shift toward sustainability, it becomes paramount to understand how these changes might impact communities, particularly those situated in lower-income communities. The socio-spatial lens through which this study is conducted aims to unearth patterns of power plant distribution across North Carolina. With 100 counties and a total of 843 power plants consisting of 1190 generators (US EPA, 2021), the state’s unique landscape offers a rich context for investigation. The overarching question is whether certain communities bear a disproportionate burden of environmentally impactful energy sources, framing the discourse within the realms of environmental justice and equity.

As a key aspect of this exploration, the research considers the connection between power plant location and emissions and various socioeconomic indicators, including but not limited to unemployment rates. This exploration is motivated by the awareness that power plants have often served as major employers in low-income areas (Union of Concerned Scientists, 2021). This research extends its focus beyond energy sources to examine the broader impacts of power plants on health and social vulnerability, and scrutinizes disparities in pollution exposure and associated health risks at a high level, using publicly available data. Through an exploration that encompasses various socioeconomic indicators, this research contributes to a more nuanced understanding of the multifaceted impacts of power plants.

The following research questions are addressed:
1. How does income impact power plant characteristics at the county level?
2. Do power plant retirements have a significant impact on unemployment?
3. Does publicly available data show a relationship between power generation and social vulnerability, health vulnerability, or environmental burden?

2 Dataset Information

Data used in this analysis comes from a variety of sources. A brief overview of each source is outlined below and full citations can be found at the end of this report. More information on the metadata for each source can be found in the Metadata folder on github.

2.0.1 Emissions & Generation Resource Integrated Database (eGRID)

The study utilizes the 2020 and 2021 eGRID provided by the U.S. Environmental Protection Agency to acquire essential power plant characteristics. This annually updated database offers comprehensive details on emissions, emission rates, generation, heat input, resource mix, location, and various other indicators. For the specific analyses undertaken in this research, key data elements employed included the plant Federal Information Processing Standards (FIPS) code, generation, nameplate capacity, and emissions by plant.

To process this dataset, the plant and generator sheets undergo individual modifications before being integrated. Notably, columns relevant to the analyses were selected, an additional column was introduced to consolidate state and county FIPS codes into a unified county FIPS code, and the overall dataset was filtered to North Carolina, exclusively. A parallel procedure is executed for the generator sheet before merging the two datasets using the unique ORISPL code.

It is important to highlight that although the most recent eGRID database available was from 2021, this version lacked historical generator retirement data. Consequently, the 2020 eGRID database is incorporated to facilitate the analysis of unemployment trends related to generator retirements.

2.0.2 Economic Research Service Datasets: Median Household Income

To establish a nexus between power plants and socioeconomic factors, household income data is derived from data compiled by the Economic Research Service of the United States Department of Agriculture. This data includes metrics such as household income and poverty rates at the county level, with identification facilitated by the Federal Information Processing Standards (FIPS) Code. The preparation of this dataset involves the extraction of pertinent columns, specifically median household income, followed by integration with the initial eGRID dataframe. The unifying factor for this integration is the FIPS Code, which served as the common column linking the two datasets.

2.0.3 Economic Research Service Datasets: Unemployment

As an additional facet of investigation, the analysis incorporates unemployment data obtained from the Economic Research Service of the United States Department of Agriculture. The dataset, spanning the years 2000 through 2021, comprises comprehensive information on the total labor force, the number of employed individuals, the number of unemployed individuals, and the unemployment rate. The data wrangling process for this dataset unfolds in two distinct stages. Initially, the columns of interest, specifically those pertaining to the unemployment rate, are selected. Following this, the identified columns undergo appropriate conversions to their intended data types before being merged with the existing dataset, utilizing the county FIPS code as the linking identifier.

The subsequent phase of data wrangling introduces additional steps specifically tailored for conducting regression analyses between generator retirements and unemployment rates. To facilitate this analysis, the code executes a transposition of the data. This transformation presents North Carolina county unemployment rates by year in a more streamlined format, allocating a dedicated column for each year.

2.0.4 Social Vulnerability Index

2020 data from the U.S. CDC on key vulnerability criteria including socioeconomic status, household characteristics, racial & ethnic minority status, and house type & transportation. There are four theme variables that contribute to the Overall Vulnerability index value: Socioeconomic Status (RPL_THEME1), Household Characteristics (RPL_THEME2), Racial & Ethnic Minority Status (RPL_THEME3), and Housing Type & Transportation (RPL_THEME4). The variables that contribute to each of these themes are estimated using the American Community Survey (ACS), 2016-2020 (5-year data). U.S. tracts are ranked based on percentiles. Percentile ranking values range from 0 to 1, with higher values indicating greater vulnerability. Percentile ranks are available for the 16 individual variables that feed into the four themes, the four themes, and the overall vulnerability position.

Initially, this data set was chosen because it contains socioeconomic information at the county and tract level that was not available for 2020 in other sources we reviewed. Once we began to explore the dataset, and learned it contained variables we did not previously know were publicly available at this resolution, we chose to continue working with it beyond just the unemployment and income information we were initially seeking, and expanded to consider relationships between the broader set of variables it uses.

2.0.5 Environmental Justice Index

2022 report published by the U.S. CDC on environmental burden, social vulnerability, and health vulnerability indicators. This work builds on the Environmental Protection Agency (EPA)’s EJSCREEN. The indicators (social vulnerability, environmental burden, health vulnerability, and the variables that contribute to each of them) were selected based on a literature review conducted between December 2020 and December 2021. The Environmental Justice Index describes itself as “the first national, place-based tool designed to measure the cumulative impacts of environmental burden through the lends of human health and health equity.”

After learning about the Social Vulnerability Index, we continued exploring information available on the CDC’s Agency for Toxic Substances and Disease Registry site and, upon finding the EJI, were compelled to explore whether we might test hypotheses we had that generation and emissions from power plants would be correlated with increased environmental burden and health vulnerability within the year of 2021 in North Carolina, specifically. It is crucial to note that the EJI, like any tool produced for the national scale, has limitations baked into the resolution at which it provides information. We believe it is important to highlight from their data documentation, that “injustice occurs locally. High-level tools such as the EJI cannot capture all social, environmental, or health issues that a community may face.” Detail on limitations and considerations can be found on page 7 of the EJI data documentation in the Metadata folder on github.

2.0.6 Spatial Data

2020 data from Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry/ Geospatial Research, Analysis, and Services Program is utilized for visualizations of data at the county level as well as the “cb_2018_us_county_20m” shapefile, which is a cartographic boundary file from the United States Census Bureau. 2018 was the most recent year available for Cartographic Boundary Files downloads.

Table 1: Dataset Information
Dataset Attribute Description
eGRID Key Variables Plant locations, production, capacity, retirements, and emissions
eGRID Data Hierarchy Generators are associated to plants, plants are associated to counties and FIPS codes
eGRID Data Range 2020 (dataset used only to reference historical retirement data) and 2021 (all other data)
eGRID Data Source The United States Environmental Protection Agency
Income Key Variables Median Household Income, FIPS codes
Income Data Hierarchy Income data is associated with FIPS county code
Income Data Range 2021
Income Data Source The Economic Research Service of the United States Department of Agriculture
Unemployment Key Variables Unemployment rate, FIPS code
Unemployment Data Hierarchy Unemployment rates are associated with FIPS county code
Unemployment Data Range 2000 to 2021
Unemployment Data Source The Economic Research Service of the United States Department of Agriculture
Social Vulnerability Index Key Variables Socioeconomic status, household characteristics, racial & ethnic minority status, and house type & transportation
Social Vulnerability Index Data Hierarchy Indicator data is available at the tract level
Social Vulnerability Index Data Range 2020
Social Vulnerability Index Data Source The United States Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry Social Vulnerability Index (CDC/ATSDR SVI)
Environmental Justice Index Key Variables Environmental burden, social vulnerability, and health vulnerability
Environmental Justice Index Data Hierarchy Indicator data is available at the tract level
Environmental Justice Index Data Range 2020 to 2021
Environmental Justice Index Data Source The United States Agency for Toxic Substances and Disease

3 Data Wrangling

The initial step in our research, once loaded, was data wrangling on the eGRID, unemployment, and income datasets. Data wrangling, or the process of transforming raw data, was a crucial step in this research, particularly due to the diverse nature of the datasets. For the eGRID dataset, the code selects specific columns related to power plant information in North Carolina, converts certain columns to numeric format, and then combines the generator and plant data based on a common identifier. Duplicate columns are removed, resulting in the plant_gen_data dataframe. The code then proceeds to handle the unemployment dataset, selecting relevant columns covering the years 2002-2022 and merging it with the plant_gen_data dataframe using the FIPS code as a key. Similarly, the income dataset is processed by selecting specific columns, and the final step involves merging this income data with the previously created dataframe. The resulting dataset, named processed_data, contains information on power plants, unemployment rates, and income data, and the column names are displayed at the end of the code.

The SVI dataset was available at the county level, so it was simpler than the process for EJI, which follows. After import, SVI variables of interest were read into a data frame. Alongside this, plant generation data was grouped by county and summarized and later merged with the county-level SVI data.

The data wrangling process for the EJI data was complex, so the major steps are described here. Key pieces of the data wrangling process for Q3 involved first establishing a shared GEOID format, then getting 2021 spatial census data to match tract-level GEOIDs with their respective coordinates by using tigris::tracts for North Carolina. Coordinate systems were checked throughout to ensure alignment. The EJI attributes were then joined to the county spatial features with geometry, by the GEOID field. Some tracts did not have EJI data. Variables of interest were then selected within the newly-created tracts_EJI_sf_select. The next step was to match power plant data to tracts, which was possible because LAT and LON were available for the plants. After checking coordinate systems (NAD1983, EPSG 4269), the coordinates of the power plants in North Carolina were matched to tract geometry via st_intersection with the tracts_EJI_sf_select, into intersection_plants_EJI.

In order to get tract-level values for generation and emissions, which were at that point existing at the plant level, intersection_plants_EJI was grouped by GEOID and summarized such that variables of interest were summed from plant level to tract level. For example, the new column “Tract_Generation” was formed from the sum of the values of power plant generation for each plant in a tract, for each tract. With the addition of the tract-level EJI variables of interest, analyses on relationships between emissions, generation, and tract-level EJI data could finally be run. This was important because the EJI data documentation discourages adding up tract-level values to get to the county level. Information living at a common spatial resolution (being able to identify what tract a power plant was in) was a crucial foundation for running analyses. Lastly, NAs were dropped for variables of interest.

4 Exploratory Analysis

This analysis initiates data exploration and visualization for the processed_data dataframe, focusing on fuel types, income, and electricity generation in North Carolina. Figure 4.1 generates a pie chart to illustrate the percentage makeup of primary fuel types in North Carolina based on total annual generation. Subsequently, the scatter plot in Figure 4.2 depicts the relationship between median household income and total plant annual generation, with points colored by fuel type. The exploratory analysis further drills down to scatter plots excluding nuclear data and for a selected set of fuel types (COAL, SOLAR, OIL, WIND, GAS). Additionally, stacked bar plots in Figures 4.5 and 4.6 visualize the total annual generation by fuel type and county, and for better clarity, separate plots are created for the top 10 and bottom 10 counties based on median household income. These visualizations help explore and understand the relationships between fuel types, income levels, and electricity generation in different counties of North Carolina.

Pie chart illustrating the percentage distribution of primary fuel types in North Carolina based on total annual generation.

Figure 4.1: Pie chart illustrating the percentage distribution of primary fuel types in North Carolina based on total annual generation.

Scatter plot illustrating the correlation between median household income and total annual generation, color-coded by fuel type. Owing to the unique characteristics of nuclear power plants, operating at significantly higher capacity factors than other sources, the plot lacks a discernible relationship, prompting necessary adjustments.

Figure 4.2: Scatter plot illustrating the correlation between median household income and total annual generation, color-coded by fuel type. Owing to the unique characteristics of nuclear power plants, operating at significantly higher capacity factors than other sources, the plot lacks a discernible relationship, prompting necessary adjustments.

Exploring energy patterns, this scatter plot analyzes the connection between median household income and total annual generation, excluding nuclear energy. The updated chart reveals a potential trend, especially around the $70,000 median income mark, where data points for generation are sparse.

Figure 4.3: Exploring energy patterns, this scatter plot analyzes the connection between median household income and total annual generation, excluding nuclear energy. The updated chart reveals a potential trend, especially around the $70,000 median income mark, where data points for generation are sparse.

To narrow our exploration further, we selected key energy sources: coal, gas, oil, wind, and solar.

Figure 4.4: To narrow our exploration further, we selected key energy sources: coal, gas, oil, wind, and solar.

Stacked bar plot illustrating the total annual generation by fuel type in the top 10 counties (by median income) in North Carolina.

Figure 4.5: Stacked bar plot illustrating the total annual generation by fuel type in the top 10 counties (by median income) in North Carolina.

To get a balanced view of the total annual generation by fuel type and county, we then show the bottom 10 counties (by median income) in North Carolina. This identified Richmond county as a county of interest.

Figure 4.6: To get a balanced view of the total annual generation by fuel type and county, we then show the bottom 10 counties (by median income) in North Carolina. This identified Richmond county as a county of interest.

We then created initial maps to better understand the landscape of North Carolina in terms of socioeconomic status and geographic distribution of power plants. Figure 4.7 below shows the mapping of power plants across the state and helped us to visualize the concentration of power plants across the state. Figure 4.8 maps the nameplate capacity of those power plants by county. We noticed that multiple counties, especially Richmond county, stood out as having a high concentration of capacity. Figure 4.9 demonstrates a heat map of the percentage of each county’s population living in poverty as of 2021. Richmond county is once again a clear visual outlier on this map.

Locations of active power plants across North Carolina as of 2021. Each black dot represents a power plant.

Figure 4.7: Locations of active power plants across North Carolina as of 2021. Each black dot represents a power plant.

This map shows the total nameplate capacity in Megawatts of each county in North Carolina. Multiple counties have no power plants and are blank. Richmond County is an interesting outlier which the highest capacity of all the counties.

Figure 4.8: This map shows the total nameplate capacity in Megawatts of each county in North Carolina. Multiple counties have no power plants and are blank. Richmond County is an interesting outlier which the highest capacity of all the counties.

Percent of total county population living in poverty in 2021, as defined by the USDA Economic Research Service.

Figure 4.9: Percent of total county population living in poverty in 2021, as defined by the USDA Economic Research Service.

Additionally, we explored power plant emissions geographically as a final stage of our exploratory analysis. Figures 4.10, 4.11, 4.12, and 4.13 demonstrate the aggregated power plant emissions per county in 2021 for CO2, NOx, SO2, and CH4 respectively. Unfortunately eGrid did not provide data for Mercury (Hg) in 2021 so that was not analyzed. CH4 is provided in lbs while CO2, NOx, and SO2 are in short tons. For CO2 emissions, Richmond county stood out. For NOx emissions, Catawba county stood out. For SO2 emissions, Catawba, Haywood, and Person counties stood out. For CH4 emissions, Catawba and Person counties stood out.

Total CO2 Emissions in 2021 across North Carolina Power Plants.

Figure 4.10: Total CO2 Emissions in 2021 across North Carolina Power Plants.

Total NOx Emissions in 2021 across North Carolina Power Plants.

Figure 4.11: Total NOx Emissions in 2021 across North Carolina Power Plants.

Total SO2 Emissions in 2021 across North Carolina Power Plants

Figure 4.12: Total SO2 Emissions in 2021 across North Carolina Power Plants

Total CH4 Emissions in 2021 across North Carolina Power Plants.

Figure 4.13: Total CH4 Emissions in 2021 across North Carolina Power Plants.

As exploratory work into the dimensions of social vulnerability, environmental burden, and health vulnerability in relation to generation and emissions across power plants in North Carolina, tests for normality were conducted on the overall index value for each of the aggregated themes (SVI, EBM, HVM). The distribution of the variables corresponding to these three overall themes of interest can be seen in Figures 4.14, 4.15, and 4.16.

Test for normality in distribution of Social Vulnerability Module percentile ranks. Normal Q-Q Plot does not show a totally normal distribution. Distribution is skewed such that most observations fall in the second half.

Figure 4.14: Test for normality in distribution of Social Vulnerability Module percentile ranks. Normal Q-Q Plot does not show a totally normal distribution. Distribution is skewed such that most observations fall in the second half.

Test for normality in distribution of Environmental Burden Module percentile ranks. Normal Q-Q Plot does not show a totally normal distribution. Distribution is skewed such that most observations fall in the second half.

Figure 4.15: Test for normality in distribution of Environmental Burden Module percentile ranks. Normal Q-Q Plot does not show a totally normal distribution. Distribution is skewed such that most observations fall in the second half.

Test for normality in distribution of Environmental Burden Module percentile ranks. Distribution is shows that HVM (as Percentile Rank) is ordered but not continuous, the way EBM is. EJI obtains its data on indicators from a variety of sources. RPL_HVM, specifically, can take a value of 0, 0.2, 0.4, 0.6, 0.8, or 1.0.

Figure 4.16: Test for normality in distribution of Environmental Burden Module percentile ranks. Distribution is shows that HVM (as Percentile Rank) is ordered but not continuous, the way EBM is. EJI obtains its data on indicators from a variety of sources. RPL_HVM, specifically, can take a value of 0, 0.2, 0.4, 0.6, 0.8, or 1.0.

As the last set of geographic explorations, two maps were created. The first was a map of NC power plants overlaid on tract boundaries to ensure that the matching of tracts to power plants for based on spatial features for Q3 analyses was successful (see Figure 4.17). The second maps asthma prevalence across NC tracts, as a visual grounding point in relation to Q3 analyses, given that the emissions considered in this project are known to have respiratory effects in humans (see Figure 4.18).

Figure 4.17: Power Plants Across North Carolina Tracts

Mapping Asthma Prevalence Across Tracts

Figure 4.18: Mapping Asthma Prevalence Across Tracts

5 Question 1: How does income impact power plant characteristics at the county level?

5.0.1 Null Hypothesis: The economic status of a community does not bear a statistically significant impact on the features of its power infrastructure.

This analysis begins by examining the impact that income has on various power plant characteristics. This first analysis uses income as the highest-level socioeconomic indicator, as income often dictates an array of follow-on indicators. Several regressions are conducted including the impact of income on the fuel type (differentiating between clean and dirty sources), number of plants, and nameplate capacity.

Examining only the impact of income on coal plants, the estimated coefficient is approximately -134.6 (see Table 5.1). This suggests that for each one-unit increase in income, the predicted value of coal generation decreases by approximately 134.6 MWh. The coefficient’s p-value (0.1049) is greater than 0.05, indicating that it is not statistically significant at the 0.05 significance level. Furthermore, the R^2 suggests that only about 13.24% of the variability in coal generation is explained by income. Figure 5.1 below visualizes this relationship.


Table 5.1: Linear Regression Results - Median Income Impact on Coal Generation
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11787072.226 4.751342e+06 2.480788 0.0226378
MEDHHINC_2021 -134.575 7.904056e+01 -1.702607 0.1049460

Relationship of median household income ($) on coal generation (MWh) in 2021 by county.

Figure 5.1: Relationship of median household income ($) on coal generation (MWh) in 2021 by county.

The linear regression shown in Figure 5.2 below suggests that there might be a relationship between income and solar generation. The statistically significant p-value (0.04707) found in Table 5.2 indicates that there is evidence to suggest that an increase in median household income is associated with a decrease in the predicted value of solar generation However, it’s important to note that the explained variance is quite low (about 0.56%), and so other factors not included in the model likely contribute to the variability in generation for solar.


Table 5.2: Linear Regression Results - Median Income Impact on Solar Generation
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23386.1972482 4662.3424985 5.015976 0.0000007
MEDHHINC_2021 -0.1707202 0.0858266 -1.989128 0.0470720

Relationship of median household income ($) on solar generation (MWh) in 2021 by county.

Figure 5.2: Relationship of median household income ($) on solar generation (MWh) in 2021 by county.

The next stage of analysis examined if there was a relationship between the quantity of generators and median household income. The linear regression results indicate that median household income does not show a statistically significant association with the number of plants. The coefficient is estimated to be -0.0000786, with a standard error of 0.0000922, a t-value of -0.85, and a p-value of 0.3962 (See Table 5.3). This suggests that, based on the available data, changes in median household income do not appear to be statistically linked to variations in the number of plants. Figure 5.3 below illustrates this relationship.


Table 5.3: Linear Regression Results - Median Income Impact on Number of plants
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.5710566 5.2413343 3.3524015 0.0011875
MEDHHINC_2021 -0.0000786 0.0000922 -0.8526597 0.3961893

Relationship of Median Household Income and Number of Power Plants in NC in 2021 by county.

Figure 5.3: Relationship of Median Household Income and Number of Power Plants in NC in 2021 by county.

As a last step toward answering Question 1, an analysis was conducted on whether the average nameplate capacity of plants in a county is related to median household income across the data set in use, given that power plants differ in size. The linear regression on this relationship yields a coefficient of 0.0019239, accompanied by a standard error of 0.0023465 (See Table 5.4). This implies that, theoretically, an increase in median household income is associated with a slight corresponding increase in Nameplate Capacity. However, the non-significant p-value of 0.4145 means we do not have sufficient evidence to reject the null hypothesis and the observed association between median household income and Nameplate Capacity may be coincidental. Figure 5.4 below demonstrates this relationship.


Table 5.4: Linear Regression Results - Median Income Impact on Nameplate Capacity
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.0045289 133.4423544 0.0749727 0.9404086
MEDHHINC_2021 0.0019239 0.0023465 0.8198774 0.4145266

Relationship of Median Household Income on Average Nameplate Capacity (MW) by county in 2021.

Figure 5.4: Relationship of Median Household Income on Average Nameplate Capacity (MW) by county in 2021.

5.1 Summary

This study’s initial inquiry sought to uncover how income, a significant socioeconomic indicator, relates to power plant characteristics. The investigation focused on the fuel type, the number, and the total nameplate capacity of power plants, each representing distinct dimensions of interest. Although the study did not produce definitive and statistically significant results regarding the influence of income on the number of power plants or nameplate capacity, it is suggested that this may be due to the complexity of the energy landscape. While one might expect lower generation capacity and fewer power plants in wealthier areas due to potential opposition, it is crucial to consider that energy consumption is often linked to a higher standard of living. Consequently, having generation capacity in higher-income areas may be necessary to meet the increased demand. Although this aspect of the exploration did not yield statistically significant results, it is noteworthy that the type of fuel generation, specifically solar, displayed some significance. The analysis indicated that an increase in median household income is associated with a decrease in solar generation. However, it is important to recognize that, despite being statistically significant, the coefficient of determination (r2) suggests that the explained variance is quite modest, standing at only 0.56%.

6 Question 2: Do power plant retirements have a significant impact on unemployment?

6.0.1 Null Hypothesis: The retirement of power plant generators does not have an impact on unemployment rates in the counties in which they are located.

The primary motivation behind this analysis is to delve into the nuanced equilibrium inherent in a just energy transition. The imperative of shifting energy fuel sources from carbon-intensive to fossil fuel-free must be achieved without detriment to communities. In this examination, we initially pinpoint counties in North Carolina with the highest and lowest numbers of retirements, comparing them to their respective unemployment rates over time. To conduct a more pointed analysis, a statistical examination is performed on a specific county of interest.

Power plant retirements were analyzed (using data contained in the 2020 eGRID dataset) to identify which counties had the most number of retired power plants. The full selection of power plants with historical retirements can be seen below. Counties with no retirements were also subsequently selected to provide a comparison.

Table: Number of historical power plant retirements per county based on 2020 eGRID dataset

## FIPS Code: 37019 has 2 rows.
## FIPS Code: 37021 has 2 rows.
## FIPS Code: 37023 has 1 rows.
## FIPS Code: 37035 has 2 rows.
## FIPS Code: 37041 has 1 rows.
## FIPS Code: 37045 has 5 rows.
## FIPS Code: 37047 has 3 rows.
## FIPS Code: 37049 has 1 rows.
## FIPS Code: 37051 has 4 rows.
## FIPS Code: 37063 has 1 rows.
## FIPS Code: 37067 has 1 rows.
## FIPS Code: 37069 has 1 rows.
## FIPS Code: 37071 has 5 rows.
## FIPS Code: 37075 has 4 rows.
## FIPS Code: 37117 has 4 rows.
## FIPS Code: 37125 has 1 rows.
## FIPS Code: 37129 has 6 rows.
## FIPS Code: 37145 has 2 rows.
## FIPS Code: 37155 has 6 rows.
## FIPS Code: 37157 has 6 rows.
## FIPS Code: 37159 has 7 rows.
## FIPS Code: 37191 has 7 rows.

Based on the dataset above, FIPS Code 37159 (Rowan) and 37191 (Wayne) each have the highest number of retirements at 7 rows. FIPS Code 37023 (Burke) 37041 (Chowan) 37049 (Craven) 37063 (Durham) 37067 (Forsyth) 37069 (Franklin) 37125 (Moore) each have 1 row.

In this analysis, counties experiencing both high (blue) and low (green) numbers of generator retirements, such as Rowan and Wayne (high retirements) and Burke and Craven (low retirements), are singled out. The objective is to investigate whether fluctuations in unemployment rates are influenced by generator retirements or if the rates exhibit similar trends irrespective of such retirements. Through this exploratory analysis, the goal is to discern any distinctive patterns in unemployment dynamics. Given that the plot in Figure 6.1 below shows instances in which the unemployment rates do not follow the hypothesized pattern, another variable was chosen for statistical analysis.

Historical unemployment rates in Wayne, Rowan, Durham, and Chowan Counties

Figure 6.1: Historical unemployment rates in Wayne, Rowan, Durham, and Chowan Counties

6.1 Summary

The examination of the impact resulting from the retirement of power plant generators failed to produce definitive results; nevertheless, it contributes to the expanding field of advocating for an environmental justice perspective. This research goes beyond the reliance on the reduction of tons of carbon dioxide equivalents in emissions as a measure of the energy transition’s success. It underscores the importance of adopting a holistic approach to key performance indicators, particularly given that large power plants are often situated in rural areas where communities heavily depend on utilities for employment (Union of Concerned Scientists, 2021). The analytical methodology adheres to best practices, incorporating counties with varying numbers of retirements to control for factors external to the retirements themselves. Additionally, the analysis delves into the actual nameplate capacity of retired assets, recognizing the plausible implication that a smaller capacity may employ fewer individuals compared to a larger plant. Contrary to the initial hypothesis, the findings reveal a statistically significant relationship, wherein each megawatt increase in retired nameplate capacity is associated with a marginal decrease in the unemployment rate. However, the inherent data limitations within this relatively small sample size require cautious interpretation. It is crucial to acknowledge that while this analysis falls short of providing conclusive results, it serves as a foundation for further exploration in subsequent analyses and environmental justice research.

7 Question 3: Does publicly available data show a relationship between power generation and social vulnerability, health vulnerability, or environmental burden?

7.0.1 Null hypothesis: There is no relationship between power plant distribution and social vulnerability, health vulnerability, or environmental burden.

Having explored two major socioeconomic dimensions in relation to total annual generation, the following sections expand upon income and unemployment to consider social vulnerability, environmental burden, and health vulnerability as the final topics in scope for this project’s learning objectives.

7.0.1.1 Sub-question: Does publicly available data show a relationship between power generation and social vulnerability?

A set of analyses was conducted to briefly explore whether there is a significant relationship between power plants and the social vulnerability index (SVI), more broadly. A linear regression of SVI themes on total annual generation by county generated results summarized in Table 7.1.


Table 7.1: Linear Regression Results - SVI and Total Generation
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3820133 3462723 1.1032162 0.2735081
RPL_THEMES 3643958 5744791 0.6343064 0.5278370

Examining the positive coefficient of 3643958 alone suggested that an increase in the overall SVI aggregated theme (RPL_THEMES) leads to an increase in generation. However, the multiple R-squared value was very small (0.005408), signaling that this model does not fit the data well; with an R-squared so low, RPL Themes could only explain 0.5% of total generation by county across North Carolina in 2021. Ultimately, the p-value was not statistically significant (0.5278 > .05). The null hypothesis that there is no relationship between generation by county and RPL Themes in North Carolina in 2021 therefore could not be rejected. As apparent in Figure 7.1, at the county level across North Carolina in 2021, percentile rank for the Social Vulnerability Index across themes cannot be concluded to have a significant relationship to total power generation within this dataset.

The relationship between the Social Vulnerability Index (SVI) and 2021 annual generation in MWh across all power plants in a county.

Figure 7.1: The relationship between the Social Vulnerability Index (SVI) and 2021 annual generation in MWh across all power plants in a county.

7.0.1.2 Sub-question: Does publicly available data show a relationship between power generation and environmental burden?

Given that power plants have historically made up a major stationary point source of air pollution including, for example, including SO2 and NO2, and given that both SO2 and NOx can contribute to the formation of atmospheric ozone and particulate matter (Source: “Human Health & Environmental Impacts of the Electric Power Sector”), the relationship between power plant generation and emissions and the environmental burden module (EBM) was explored next. First, a single linear regression was run to examine whether the overall environmental burden module, RPL_EBM (percentile rank), could be at all explained by total generation per tract. Generation did not ultimately perform as a significant explanatory variable of RPL_EBM; the p-value of 0.93 was well above the 0.05 threshold for significance, and the R-squared value was very low (R-squared was 0.000025) as seen in Table 7.2.


Table 7.2: Linear Regression Results - Environmental Burden and Total Generation
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4060836 0.0144914 28.0222975 0.0000000
Tract_Generation 0.0000000 0.0000000 -0.0868754 0.9308285

Another linear regression run on nameplate capacity (maximum possible generation) returned an R-squared of .00027 and a p-value of 0.77, meaning nameplate capacity could not be considered a worthwhile explanatory variable to explore for the overall environmental burden module either. Ultimately, these analyses were insufficient to reject the null hypothesis that there is no relationship between environmental burden and nameplate capacity, or power generation by tract. An existing relationship cannot be concluded. The lack of clear relationship is demonstrated in Figure 7.2 below.

The relationship between the overall Environmental Burden Module (EBM) percentile rank and total power generation by tract.

Figure 7.2: The relationship between the overall Environmental Burden Module (EBM) percentile rank and total power generation by tract.

7.0.1.3 Sub-question: Does publicly available data show a relationship between power generation and human health vulnerability?

As the last high-level, aggregated theme index value to test in order to answer Question 3, analyses were conducted to test whether a statistically significant relationship could be determined between the health vulnerability module (HVM) and total power generation (actual or maximum possible). As mentioned in the Exploratory Analysis section and illustrated in Figure 4.16, health vulnerability as a variable exists in this data set at values of 0.0, 0.2, 0.4, 0.6, 0.8, and 1.0 (percentile rank). The specific methodology for calculating Health Vulnerability can be found on page 16 of the EJI data documentation in the Metadata folder on github. Treating the variable as continuous proved unhelpful toward rejecting a null hypothesis that there is no relationship between tract-level HVM and total generation per tract (see Figure 7.3 for a visual representation of the linear regression treating RPL_HVM as continuous), therefore a series of one-way ANOVA analyses followed in which the variable was treated as categorical. The objective was to see if total generation per tract (treated as the factor) is meaningfully different across percentile ranks of HVM (RPL_HVM, treated as the levels) by testing if the mean generation is equal across all percentile rank categories. It is important to note that the percentile ranks do not have equal sample sizes and so we cannot assume totally normal distribution (as seen in Figure 4.16). The mean RPL_HVN across tracts is 0.56 for the data relevant to this project.

The relationship between the overall Health Vulnerability Module (HVM) percentile rank and total power generation by tract. As evidenced visually, total generation does not explain variability in HVM.

Figure 7.3: The relationship between the overall Health Vulnerability Module (HVM) percentile rank and total power generation by tract. As evidenced visually, total generation does not explain variability in HVM.

Bartlett’s test was run to check the null hypothesis that the variance at each of the percentile ranks is the same, and yielded a K-squared of 237.49 with 5 degrees of freedom and a p-value < 2.2e-16. ANOVA (function ‘aov’) was then run on tract generation and RPL_HVM (see Figure 7.4 below), yielding a p-value well above 0.05 (0.385). The null hypothesis that the mean of generation is the same across percentile ranks therefore could not be rejected. Lastly, a Tukey HSD test was conducted to examine the differences between pairs of percentile ranks, but every p-value generated was above 0.05.

Analysis of Variance performed on Generation and Health Vulnerability by Tract.

Figure 7.4: Analysis of Variance performed on Generation and Health Vulnerability by Tract.


Table 7.3: Linear Regression Results - Health Vulnerability and Total Generation
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5513876 0.0187636 29.3859555 0.00000
Tract_Generation 0.0000000 0.0000000 -0.1129067 0.91018

Across regression tests run on the health vulnerability index value percentile rank (RPL_HVM) and total annual generation by tract, R-squared values were low and p-values were high. The regression results from this data and set of analyses are not sufficient to suggest that total generation can explain RPL_HVM with any significance. The analysis was insufficient to reject the null hypothesis that there is no relationship between health vulnerability and total generation by tract in our dataset. At the tract level across North Carolina in 2021, a relationship cannot be determined between percentile rank for Health Vulnerability and total power generation (See Table 7.3)

7.1 Summary

This study used data from CDC’s Social Vulnerability (county resolution) and Environmental Justice (tract resolution) Indices to test hypotheses that greater generation by power plants is correlated with elevated percentile ranks on social vulnerability, environmental burden, and health vulnerability modules as defined by the CDC. The CDC documentation reminds us that “a percentile ranking represents the proportion of tracts (or counties) that are equal to or lower than a tract of interest in environmental burden. For example, a EJI ranking of 0.85 signifies that 85% of tracts in the nation likely experience less severe cumulative impacts from environmental burden than the tract of interest, and that 15% of tracts in the nation likely experience more severe cumulative impacts from environmental burden” (Source: “Human Health & Environmental Impacts of the Electric Power Sector”). Analyses did not yield results that allow us to reject the null hypothesis that there is no relationship between power plant distribution and social vulnerability, health vulnerability, or environmental burden.

After seeing high p-values and low R-squared values from regression analyses run on health vulnerability as an overall theme, one example of a set of further analyses we might conduct in a follow-up study or in a project of larger scope would be to dig one or two levels deeper into the aggregated SV, EB, and HV themes. For example, among the emissions types tracked as part of the eGRID dataset, nitrogen oxides (NOx) and sulfur dioxide (SO2) merit further exploration, as they form part of criteria air pollutants regulated by EPA. Under the Clean Air Act, US EPA has set National Ambient Air Quality Standards on six criteria air pollutants: particulate matter (PM2.5 and PM10), atmospheric ozone, carbon monoxide, lead, nitrogen dioxide, and sulfur dioxide. These pollutants reach widespread exposure–millions of people–due to their numerous and diverse sources. Causal or likely causal evidence pointing to a variety of negative health endpoints exists for each one of them. For nitrogen dioxide (NO2) and sulfur dioxide (SO2), the regulatory emissions standard-setting has been most heavily informed by research on respiratory effects. It should be noted that nitrogen oxides (NOx) also contribute to the formation of atmospheric ozone, which causes respiratory problems at elevated exposures. Given these established connections and in the absence of a strong relationship within this project’s data sets of plant generation and emissions, one might consider extending this project into an investigation of whether there is a relationship between eGRID power plant data on emissions of known criteria air pollutants, specifically SO2 and NOx, and health vulnerability data from EJI by tract across North Carolina.

8 References

*Supporting the nation’s coal workers and communities in a changing energy landscape. Union of Concerned Scientists. (2021). https://www.ucsusa.org/resources/support-coal-workers#read-online-content.

*United States Environmental Protection Agency (EPA). 2023. “Emissions & Generation Resource Integrated Database (eGRID), 2021” Washington, DC: Office of Atmospheric Protection, Clean Air Markets Division. Available from EPA’s eGRID web site: https://www.epa.gov/egrid. Accessed on 11/1/2023.

*United States Census Bureau. Cartographic Boundary Files - Shapefile. 2018 County. https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html. Accessed on 11/15/2023.

*United States Department of Agriculture / Economic Research Service. Poverty estimates for the U.S., States, and counties, 2021. https://www.ers.usda.gov/data-products/county-level-data-sets/county-level-data-sets-download-data/. Accessed on 11/1/2023.

*United States Department of Agriculture / Economic Research Service. Unemployment and median household income for the U.S., States, and counties, 2000–22. https://www.ers.usda.gov/data-products/county-level-data-sets/county-level-data-sets-download-data/. Accessed on 11/1/2023.

*Centers for Disease Control and Prevention and Agency for Toxic Substances Disease Registry/ Geospatial Research, Analysis, and Services Program. CDC/ATSDR Social Vulnerability Index 2020. Database North Carolina. https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html. Accessed on 12/6/2023.

*Centers for Disease Control and Prevention and Agency for Toxic Substances Disease Registry. 2022 Environmental Justice Index. https://www.atsdr.cdc.gov/placeandhealth/eji/index.html. Accessed on 12/5/2023.

*“CDC/ATSDR Environmental Justice Index (EJI) FAQ.” Centers for Disease Control and Prevention: Agency for Toxic Substances and Disease Registry, U.S. Department of Health & Human Services, 31 May 2023, www.atsdr.cdc.gov/placeandhealth/eji/faq_eji.html. Accessed on 12/11/2023.

“Human Health & Environmental Impacts of the Electric Power Sector.” Clean Air Power Sector Programs, United States Environmental Protection Agency (US EPA), 22 Feb. 2023, www.epa.gov/power-sector/human-health-environmental-impacts-electric-power-sector.

*Zhu, Hao. “knitr::kable and kableExtra”. 19 Feb. 2021. https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html. Accessed on 12/13/2023.